Goto

Collaborating Authors

 robust generalization gap




A Adversarial Attack Given a natural example x

Neural Information Processing Systems

Here, we only name a few. B.1 Pseudo-code of the Visualization Method As shown in Algorithm 1 for the visualization of weight loss landscape, we firstly sample a random Then, we apply the "filter normalization" technique (Line Thus, we adopt the 1-D visualization in most cases. We adversarially train PreAct ResNet-18 with different learning rate schedules using the same experimental settings in Section 3. The learning curves are shown on the left column in Figure 7, where the whole training process can be split into two stages: the early stage with small robust generalization gap ( The weight loss landscape becomes sharper correspondingly. The cyclic schedule starts to significantly enlarge the gap much later, almost after the 175-th epoch with lr < 0. 16 The previous experiments are all based on PreAct ResNet-18. The same experimental settings as Section 3 are adopted and the results are shown in Figure 8.



Common Q1: Theoretical justification on why A WP works

Neural Information Processing Systems

Common Q1: Theoretical justification on why A WP works. Based on previous work on P AC-Bayes bound (Neyshabur et al., NeurIPS 2017), in adversarial training, let R#1 Q1: The weights are constantly perturbed in the worst case, the model may find it difficult to learn. R#1 Q2: How do the baseline methods that do implicit weight perturbations differ from A WP? We did not claim that "baseline methods do the implicit weight perturbations". R#1 Q3: What is the difference of weights learned by A T -A WP and vanilla A T? R#2 Q1: Only CIF AR-10 and single neural networks are tested. We have tested several network architectures and datasets in the main body and appendix, e.g., PreAct ResNet-18, R#2 Q2: In Figure 1, the α value in the loss landscape is embed into training or post-training?


Self-Supervised Adversarial Training via Diverse Augmented Queries and Self-Supervised Double Perturbation

Neural Information Processing Systems

Our work can be seamlessly combined with models pretrained by different SSL frameworks without revising the learning objectives and helps to bridge the gap between SA T and A T. Our method also improves both robust